Nth attempt to resolve port collisions once-and-for-all#9850
Nth attempt to resolve port collisions once-and-for-all#9850kaleb-himes wants to merge 1 commit intowolfSSL:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to eliminate flaky port collisions in CI by improving the random-port selection logic in multiple make check scripts, adding intra-script deduplication and a best-effort check against ports already bound on the system.
Changes:
- Add
used_portstracking to ensure ports handed out within a single script run are unique. - Enhance
generate_port()to retry on collisions and (when available) detect already-bound ports viass/netstat. - Apply the updated
generate_port()logic across several OpenSSL/OCSP-related test scripts.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| scripts/rsapss.test | Adds per-run port dedup + system-bound port check in generate_port() |
| scripts/openssl_srtp.test | Adds per-run port dedup + system-bound port check in generate_port() |
| scripts/openssl.test | Adds per-run port dedup + system-bound port check in generate_port() |
| scripts/ocsp-stapling_tls13multi.test | Adds per-run port dedup + system-bound port check in generate_port() |
| scripts/ocsp-stapling2.test | Adds per-run port dedup + system-bound port check in generate_port() |
| scripts/ocsp-stapling.test | Adds per-run port dedup + system-bound port check in generate_port() |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| [ $collision -eq 0 ] && break | ||
|
|
||
| ((attempts++)) |
There was a problem hiding this comment.
In this script set -e is enabled, and ((attempts++)) will return a non-zero status the first time it runs (because the arithmetic expression evaluates to 0 before increment). That can cause the whole test script to exit early on the first detected collision instead of retrying. Use an increment form that won’t trip set -e (e.g., attempts=$((attempts + 1))), or explicitly ignore the status (e.g., ((attempts++)) || true).
| ((attempts++)) | |
| attempts=$((attempts + 1)) |
Description
I've been seeing port collisions in Jenkins again in spite of bwrap having worked for us for many years now. This is an attempt to further avoid the probability of port collisions in the make check scripts that use random port generation.
This solution introduces the concept of "remembering ports assigned" in addition to checking "already assigned ports on the system".
Testing:
Many many cycles running openssl/ocsp scripts in tight loops. Probability of collisions in the worst case script (openssl) is estimated to have been:
1 collision per 292 runs of the openssl script (we test more runs than that in each PR due to the many config options)
With this proposed change the probability is estimated to drop from 0.343% to 0% inside a single run of a script and ~0.006% intra-script (multiple copies of the script executing on the same machine). The new logic checks for already-handed-out ports on the machine but there is a small probability remaining that port is free when checked and no longer free by the time it gets used if another script grabs the same one.
I left the test scripts running all weekend without the fix and with the fix.
In the shell running the old code we saw 52 collisions over the course of 48 hours.
In the shell running with these changes in place we saw 0 collisions over the course of 48 hours.
I also created two "simulation" scripts to run many more tests in 10 minutes intervals then can be achieved with the actual live TLS connections in 48 hours and these are those results:
OLD SOLUTION w/ bwrap in place, using /dev/random for port values with no memory of assigned ports:
NEW SOLUTION w/ bwrap in place, using /dev/random for port values and having memory of assigned ports:
Checklist